Speech/music discrimination : novel features in time domain
نویسنده
چکیده
This research aimed to find novel features that can be used to discriminate between speech and music in the time domain for the purpose of data retrieval. The study used speech and music data that were recorded in standard anechoic chambers and sampled at 44.1 kHz. Two types of new features were found and thoroughly examined: the Ratio of Silent Frames (RSF) feature and the Time Series Events (TSE) set of features. The Receiver Operating Characteristics (ROC) curves were used to assess each one of the proposed features as well as certain relevant features from the literature for the purpose of comparison. The RSF feature introduced up to 8% enhancement when compared to a couple of relevant features from the literature. One of the TSE set of features provided close to 100% speech/music discrimination.
منابع مشابه
Robust Features for Effective Speech and Music Discrimination
Speech and music discrimination is one of the most important issues for multimedia information retrieval and efficient coding. While many features have been proposed, seldom of which show robustness under noisy condition, especially in telecommunication applications. In this paper two novel features based on real cepstrum are presented to represent essential differences between music and speech...
متن کاملAudio classification using dominant spatial patterns in time-frequency space
This paper presents a novel audio discrimination algorithm using spatial features in time-frequency (TF) space. Three types of audio signals – speech, music without vocal and music with background vocal are taken into consideration for classification. The audio segment is transformed into TF domain yielding the spatial illustration of energy. Nonnegative matrix factorization (NMF) is applied to...
متن کاملSpeech/music discrimination for multimedia applications
Automatic discrimination of speech and music is an important tool in many multimedia applications. Previous work has focused on using long-term features such as differential parameters, variances, and time-averages of spectral parameters. These classifiers use features estimated over windows of 0.5–5 seconds, and are relatively complex. In this paper, we present our results of combining the lin...
متن کاملPitch expertise is not created equal: Cross-domain effects of musicianship and tone language experience on neural and behavioural discrimination of speech and music.
Psychophysiological evidence supports a music-language association, such that experience in one domain can impact processing required in the other domain. We investigated the bidirectionality of this association by measuring event-related potentials (ERPs) in native English-speaking musicians, native tone language (Cantonese) nonmusicians, and native English-speaking nonmusician controls. We te...
متن کاملA Novel Frequency Domain Linearly Constrained Minimum Variance Filter for Speech Enhancement
A reliable speech enhancement method is important for speech applications as a pre-processing step to improve their overall performance. In this paper, we propose a novel frequency domain method for single channel speech enhancement. Conventional frequency domain methods usually neglect the correlation between neighboring time-frequency components of the signals. In the proposed method, we take...
متن کامل